AITopics | byzantine failure

Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent

Neural Information Processing SystemsMar-17-2026, 18:54:39 GMT

We study the resilience to Byzantine failures of distributed implementations of Stochastic Gradient Descent (SGD). So far, distributed machine learning frameworks have largely ignored the possibility of failures, especially arbitrary (i.e., Byzantine) ones. Causes of failures include software bugs, network asynchrony, biases in local datasets, as well as attackers trying to compromise the entire system. Assuming a set of $n$ workers, up to $f$ being Byzantine, we ask how resilient can SGD be, without limiting the dimension, nor the size of the parameter space. We first show that no gradient aggregation rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the aggregation rule capturing the basic requirements to guarantee convergence despite $f$ Byzantine workers. We propose \emph{Krum}, an aggregation rule that satisfies our resilience property, which we argue is the first provably Byzantine-resilient algorithm for distributed SGD. We also report on experimental evaluations of Krum.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent

Neural Information Processing SystemsNov-21-2025, 16:16:52 GMT

We study the resilience to Byzantine failures of distributed implementations of Stochastic Gradient Descent (SGD). So far, distributed machine learning frameworks have largely ignored the possibility of failures, especially arbitrary (i.e., Byzantine) ones. Causes of failures include software bugs, network asynchrony, biases in local datasets, as well as attackers trying to compromise the entire system. Assuming a set of $n$ workers, up to $f$ being Byzantine, we ask how resilient can SGD be, without limiting the dimension, nor the size of the parameter space. We first show that no gradient aggregation rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the aggregation rule capturing the basic requirements to guarantee convergence despite $f$ Byzantine workers. We propose \emph{Krum}, an aggregation rule that satisfies our resilience property, which we argue is the first provably Byzantine-resilient algorithm for distributed SGD. We also report on experimental evaluations of Krum.

byzantine tolerant gradient descent, machine learning, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Byzantine-tolerant distributed learning of finite mixture models

Zhang, Qiong, Chen, Jiahua

arXiv.org Machine LearningJul-18-2024

This paper proposes two split-and-conquer (SC) learning estimators for finite mixture models that are tolerant to Byzantine failures. In SC learning, individual machines obtain local estimates, which are then transmitted to a central server for aggregation. During this communication, the server may receive malicious or incorrect information from some local machines, a scenario known as Byzantine failures. While SC learning approaches have been devised to mitigate Byzantine failures in statistical models with Euclidean parameters, developing Byzantine-tolerant methods for finite mixture models with non-Euclidean parameters requires a distinct strategy. Our proposed distance-based methods are hyperparameter tuning free, unlike existing methods, and are resilient to Byzantine failures while achieving high statistical efficiency. We validate the effectiveness of our methods both theoretically and empirically via experiments on simulated and real data from machine learning applications for digit recognition. The code for the experiment can be found at https://github.com/SarahQiong/RobustSCGMM.

byzantine failure, estimator, mixture model, (16 more...)

arXiv.org Machine Learning

2407.1398

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

Variance Reduced Median-of-Means Estimator for Byzantine-Robust Distributed Inference

Tu, Jiyuan, Liu, Weidong, Mao, Xiaojun, Chen, Xi

arXiv.org Machine LearningMar-4-2021

This paper develops an efficient distributed inference algorithm, which is robust against a moderate fraction of Byzantine nodes, namely arbitrary and possibly adversarial machines in a distributed learning system. In robust statistics, the median-of-means (MOM) has been a popular approach to hedge against Byzantine failures due to its ease of implementation and computational efficiency. However, the MOM estimator has the shortcoming in terms of statistical efficiency. The first main contribution of the paper is to propose a variance reduced median-of-means (VRMOM) estimator, which improves the statistical efficiency over the vanilla MOM estimator and is computationally as efficient as the MOM. Based on the proposed VRMOM estimator, we develop a general distributed inference algorithm that is robust against Byzantine failures. Theoretically, our distributed algorithm achieves a fast convergence rate with only a constant number of rounds of communications. We also provide the asymptotic normality result for the purpose of statistical inference. To the best of our knowledge, this is the first normality result in the setting of Byzantine-robust distributed learning. The simulation results are also presented to illustrate the effectiveness of our method.

byzantine machine, estimator, vrmom estimator, (13 more...)

arXiv.org Machine Learning

2103.0286

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Communication-efficient Byzantine-robust distributed learning with statistical guarantee

Zhou, Xingcai, Chang, Le, Xu, Pengfei, Lv, Shaogao

arXiv.org Machine LearningFeb-27-2021

Communication efficiency and robustness are two major issues in modern distributed learning framework. This is due to the practical situations where some computing nodes may have limited communication power or may behave adversarial behaviors. To address the two issues simultaneously, this paper develops two communication-efficient and robust distributed learning algorithms for convex problems. Our motivation is based on surrogate likelihood framework and the median and trimmed mean operations. Particularly, the proposed algorithms are provably robust against Byzantine failures, and also achieve optimal statistical rates for strong convex losses and convex (non-smooth) penalties. For typical statistical models such as generalized linear models, our results show that statistical errors dominate optimization errors in finite iterations. Simulated and real data experiments are conducted to demonstrate the numerical performance of our algorithms.

algorithm, theorem 3, worker machine, (16 more...)

arXiv.org Machine Learning

2103.00373

Country:

Asia > Middle East > Jordan (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent

Blanchard, Peva, Mhamdi, El Mahdi El, Guerraoui, Rachid, Stainer, Julien

Neural Information Processing SystemsFeb-14-2020, 04:56:37 GMT

We study the resilience to Byzantine failures of distributed implementations of Stochastic Gradient Descent (SGD). So far, distributed machine learning frameworks have largely ignored the possibility of failures, especially arbitrary (i.e., Byzantine) ones. Causes of failures include software bugs, network asynchrony, biases in local datasets, as well as attackers trying to compromise the entire system. Assuming a set of $n$ workers, up to $f$ being Byzantine, we ask how resilient can SGD be, without limiting the dimension, nor the size of the parameter space. We first show that no gradient aggregation rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure.

aggregation rule, byzantine tolerant gradient descent, machine learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

BRIDGE: Byzantine-resilient Decentralized Gradient Descent

Yang, Zhixiong, Bajwa, Waheed U.

arXiv.org Machine LearningAug-21-2019

Decentralized optimization techniques are increasingly being used to learn machine learning models from data distributed over multiple locations without gathering the data at any one location. Unfortunately, methods that are designed for faultless networks typically fail in the presence of node failures. In particular, Byzantine failures---corresponding to the scenario in which faulty/compromised nodes are allowed to arbitrarily deviate from an agreed-upon protocol---are the hardest to safeguard against in decentralized settings. This paper introduces a Byzantine-resilient decentralized gradient descent (BRIDGE) method for decentralized learning that, when compared to existing works, is more efficient and scalable in higher-dimensional settings and that is deployable in networks having topologies that go beyond the star topology. The main contributions of this work include theoretical analysis of BRIDGE for strongly convex learning objectives and numerical experiments demonstrating the efficacy of BRIDGE for both convex and nonconvex learning tasks.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Machine Learning

1908.08098

Country: North America > United States (0.93)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)

Add feedback

Generalized Byzantine-tolerant SGD

Xie, Cong, Koyejo, Oluwasanmi, Gupta, Indranil

arXiv.org Machine LearningMar-23-2018

We propose three new robust aggregation rules for distributed synchronous Stochastic Gradient Descent (SGD) under a general Byzantine failure model. The attackers can arbitrarily manipulate the data transferred between the servers and the workers in the parameter server (PS) architecture. We prove the Byzantine resilience properties of these aggregation rules. Empirical analysis shows that the proposed techniques outperform current approaches for realistic use cases and Byzantine attack scenarios.

artificial intelligence, dimension, machine learning, (13 more...)

arXiv.org Machine Learning

1802.10116

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates

Yin, Dong, Chen, Yudong, Ramchandran, Kannan, Bartlett, Peter

arXiv.org Machine LearningMar-5-2018

In large-scale distributed learning, security issues have become increasingly important. Particularly in a decentralized environment, some computing units may behave abnormally, or even exhibit Byzantine failures---arbitrary and potentially adversarial behavior. In this paper, we develop distributed learning algorithms that are provably robust against such failures, with a focus on achieving optimal statistical performance. A main result of this work is a sharp analysis of two robust distributed gradient descent algorithms based on median and trimmed mean operations, respectively. We prove statistical error rates for three kinds of population loss functions: strongly convex, non-strongly convex, and smooth non-convex. In particular, these algorithms are shown to achieve order-optimal statistical error rates for strongly convex losses. To achieve better communication efficiency, we further propose a median-based distributed algorithm that is provably robust, and uses only one communication round. For strongly convex quadratic loss, we show that this algorithm achieves the same optimal error rate as the robust distributed gradient descent algorithms.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1803.01498

Genre: Research Report (0.64)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

Chen, Yudong, Su, Lili, Xu, Jiaming

arXiv.org Machine LearningOct-22-2017

We consider the problem of distributed statistical machine learning in adversarial settings, where some unknown and time-varying subset of working machines may be compromised and behave arbitrarily to prevent an accurate model from being learned. This setting captures the potential adversarial attacks faced by Federated Learning -- a modern machine learning paradigm that is proposed by Google researchers and has been intensively studied for ensuring user privacy. Formally, we focus on a distributed system consisting of a parameter server and $m$ working machines. Each working machine keeps $N/m$ data samples, where $N$ is the total number of samples. The goal is to collectively learn the underlying true model parameter of dimension $d$. In classical batch gradient descent methods, the gradients reported to the server by the working machines are aggregated via simple averaging, which is vulnerable to a single Byzantine failure. In this paper, we propose a Byzantine gradient descent method based on the geometric median of means of the gradients. We show that our method can tolerate $q \le (m-1)/2$ Byzantine failures, and the parameter estimate converges in $O(\log N)$ rounds with an estimation error of $\sqrt{d(2q+1)/N}$, hence approaching the optimal error rate $\sqrt{d/N}$ in the centralized and failure-free setting. The total computational complexity of our algorithm is of $O((Nd/m) \log N)$ at each working machine and $O(md + kd \log^3 N)$ at the central server, and the total communication cost is of $O(m d \log N)$. We further provide an application of our general results to the linear regression problem. A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. We prove that the aggregated gradient converges uniformly to the true gradient function.

artificial intelligence, geometric median, machine learning, (15 more...)

arXiv.org Machine Learning

1705.05491

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)

Add feedback